MPI Performance Comparison on Distributed and Shared Memory Machines

نویسنده

  • Tom Loos
چکیده

The widely implemented MPI Standard 10] deenes primitives for point-to-point inter-processor communication (IPC), collective IPC, and synchronization based on message passing. The main reason to use a message passing standard is to ease the development, porting, and execution of applications on the variety of parallel computers that can support the paradigm, including shared memory, distributed memory, and shared memory array multiprocessors. This paper compares the SGI Power Challenge, a shared memory multiprocessor, with the Intel Paragon, a distributed memory machine. This paper addresses two questions: How do MPI and memory performance compare on the SGI Power Challenge for various message sizes and numbers of CPUs? Can MPI's relative performance on the SGI Power Challenge also be used to predict performance on shared memory, distributed memory, and shared memory array multiprocessors? Memory and communications tests written in C++ using messages of double precision arrays show that both memory and MPI blocking IPC performance on the Power Challenge degrade once total message sizes grow larger than the second level cache. Comparing the MPI and memory performance curves indicate Power Challenge native MPI point-to-point communication is implemented using memory copying and synchronization is implemented using a binary tree algorithm. A model of blocking IPC for the SGI Power Challenge was developed, which is validated with performance results. A new measure of communications eeciency and overhead, the ratio of IPC time to memory copy time, is used to compare relative IPC performance for diierent machines. Comparison of Power Challenge and the Paragon show that the Paragon is more eecient for small messages, but the Power Challenge is better for large messages. Power Challenge observations do not generally correspond with those on the Paragon, indicating shared memory multiprocessor results should not be used to predict distributed memory multiprocessor performance. This suggests that parallel algorithms should not judged solely on their performance on one type of machine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of MPI Benchmark Programs on Shared Memory and Distributed Memory Machines (Point-to-Point Communication)

There are several benchmark programs available to measure the performance of MPI on parallel computers. The most common use of MPI benchmarks software are SKaMPI, Pallas MPI Benchmark, MPBench, Mpptest and MPIBench. It is interesting to analyze the differences between different benchmark. Presently, there have been few comparisons done between the different benchmarks. Thus, in this paper we di...

متن کامل

A comparison of MPI performance on

Since MPI 1] has become a standard for message-passing on distributed memory machines a number of implementations have evolved. Today there is an MPI implementation available for all relevant MPP systems, a number of which is based on MPICH 2]. In this paper we are going to present performance comparison for several implementations of MPI on diierent MPPs. Results for the Cray T3E, the IBM RS/6...

متن کامل

arallel Performance Study of Monte Carlo Photon Transport Code on Shared-, Distributed-, and Distributed-Shared-Memory Architectures

We have parallelized a Monte Carlo photon transport algorithm. Three different parallel versions of the algorithm were developed. The first version is for the Tera Multi-Threaded Architecture (MTA) and uses Tera specific directives. The second version, which uses MPI library calls, has been implemented on both the CRAY T3E and the 8-way SMP IBM SP with Power3 processors. The third version is a ...

متن کامل

Parallel Performance Study of Monte Carlo Photon Transport Code on Shared-, Distributed-, and Distributed-Shared-Memory Architectures

We have parallelized a Monte Carlo photon transport algorithm. Three different parallel versions of the algorithm were developed. The first version is for the Tera Multi-Threaded Architecture (MTA) and uses Tera specific directives. The second version, which uses MPI library calls, has been implemented on both the CRAY T3E and the 8-way SMP IBM SP with Power3 processors. The third version is a ...

متن کامل

Enhancing Application Performance Using Mini-apps: Comparison of Hybrid Parallel Programming Paradigms

In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such optio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996